python 网络爬虫

httplib2的一个自动登录的脚本例子来自httplib2文档： code.google.com/p/httplib2/wiki/Examples

从http post method的response里拿到cookie，然后之后用这个cookie与服务器交互。

#!/usr/bin/env python
import urllib
import httplib2

http = httplib2.Http()
url = 'http://www.example.com/login' 
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))
headers = {'Cookie': response['set-cookie']}
url = 'http://www.example.com/home' 
response, content = http.request(url, 'GET', headers=headers)

数据的传递可以用wireshark捕捉分析，或者直接用firefox下的live http headers看到，也可以用linux下的GET 和POST等命令看(man GET, man POST, man lwp-request)